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Abstract. We develop a systematic matrix-analytic approach, based 
on intertwinings of Markov semigroups, for proving theorems about 
hitting-time distributions for finite-state Markov chains — an approach 
that (sometimes) deepens understanding of the theorems by provid- 
ing corresponding sample-path-by-sample-path stochastic constructions. 
We employ our approach to give new proofs and constructions for two 
theorems due to Mark Brown, theorems giving two quite different rep- 
resentations of hitting-time distributions for finite-state Markov chains 
started in stationarity. The proof, and corresponding construction, for 
one of the two theorems elucidates an intriguing connection between 
hitting-time distributions and the interlacing eigenvalues theorem for 
bordered symmetric matrices. 



1. Introduction and Outline of our General Technique 

Recently, stochastic proofs and constructions have been provided for some the- 
orems that give explicit descriptions of Markov chain hitting-time distributions; 
previously known proofs of the theorems had been analytic in nature. Specifically, 
Fill [16] and Diaconis and Miclo [10] both give stochastic constructions for a fa- 
mous birth-and-death hitting-time result first proven analytically by Karlin and 
McGregor [115] in 1959. Fill [TS] (see also Miclo [23]) extends to upward-skip-free 
and more general chains, in particular giving a (sometimes) stochastic proof for a 
hitting-time theorem for upward-skip-free chains established analytically by Brown 
and Shao [7]. 

In Sections 11.1111. 31 we describe a systematic approach, using intertwinings of 
Markov semigroups, for obtaining simple stochastic decompositions of the distribu- 
tions of hitting times for Markov chains and also providing samplc-path-by-sample- 
path constructions for the individual components in these decompositions. For 
example, if one can prove a theorem that the law of a certain Markov chain hitting 
time T is a convolution of Geometric distributions with certain parameters, our 
additional goal is to decompose T explicitly — sample path by sample path — as a 
sum of independent Geometric random variables with the specified parameters; this 
deepens understanding as to "why" the theorem is true. See Fill [15] for a class 
of examples using this approach. Our approach is essentially matrix-analytic, but 
if certain conditions elaborated in Sections 11.1111.21 are met, then our method also 
yields a decomposition for each sample path. For the applications discussed in this 
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paper, our approach provides new matrix-analytic proofs for hitting-time results 
which were previously only known via analytic methods (such as computation of 
Laplace transforms), and these new proofs provide new insights into the evolution 
of the Markov chain. A simple example of our approach, with an application to the 
Moran model in population genetics, is presented in Section [2j 

Wc then employ our intertwinings approach to provide new proofs for two theo- 
rems due to Mark Brown, providing two quite different representations of hitting- 
time distributions for Markov chains started in stationarity. The proof, and sub- 
sequent construction, for the first theorem (Section [3]) will elucidate an interesting 
connection between hitting-time distributions and the interlacing eigenvalues the- 
orem for bordered symmetric matrices. Application of our approach obtains a 
construction for the second theorem (Section |4]) that results in a bonus: We are 
able to extend Brown's theorem from reversible chains to more general ones. 

Notation: Throughout this paper, all vectors used arc by default row vectors. We 
write 5j for the vector of O's except for a 1 in the jth position, and 1 for the vector 
of l's. The transpose of a matrix A is denoted by A T . The notation A(:,j) :— AdJ 
is used to denote the jth column of A, and A(i, :) := 5{A to denote the ith row 
of A. For any matrix A, we let A denote the principal submatrix of A obtained 
by deleting the topmost row and leftmost column. 

1.1. Intertwinings and sample-path linking. The main conceptual tool in our 
approach is the notion of an intertwining of Markov semigroups, for which we now 
provide the needed background in the context (sufficient for our purposes) of finite- 
state Markov chains. For further background on intertwinings, see [4], [8], [26] . 
Suppose that we have two state spaces, the first ("primary") of size n and the 
second ("dual") of size n. Let P be the transition matrix of a Markov chain X, 
begun in distribution 7T , on the primary state space. [Wc write X ~ (n ,P) as 
shorthand.] Similarly, let P be the transition matrix of a Markov chain X, begun 
in no, on the dual state space. Let A be an n-by-n stochastic matrix. 

Definition 1.1. We say that the Markov semigroups (P t )t=o.i,2.... and (P')t=o,i,2 1 ... 
are intertwined by the link A (or, for short, that P and P are intertwined by the 
link A) if 

AP = PA; 

and we say that (no,P) and (no,P) are intertwined by A if additionally 

7T = 7T A. 

Here arc three consequences when (no, P) and (no, P) arc intertwined by A (with 
the first two immediate — for example, AP 2 = PAP = P 2 A — and the third crucial 
for our purposes): 

• For t = 0, 1, 2, . . . , we have AP* = P*A. 

• For t = 0,1,2, ... , the distributions nt and nt at time t satisfy nt = ntA. 

• Given X ~ (tto,P), one can build X t from Xo, . . . , X t and randomness 
independent of X so that X ~ (iro , P) and the conditional law of X t given 
(Xq, . . . , Xt) has probability mass function given by the X t -row of A: 



(1.1) 



C(X t \Xo,...,X t ) = A(X t ,-), 4 = 0,1,2,. 
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We call this last consequence sample-path linking, and will explain next, once and 
for all, (a) how it is done and (b) why it is useful for hitting-time (or mixing-time) 
constructions. We will then have no need to repeat this discussion when we turn to 
applications, each of which will therefore culminate with the explicit construction 
of an intertwining (or at least of a quasi-intertwining, as discussed in Section ll.3[) . 

Whenever we have an intertwining of (ttq,P) and (tto,P), Section 2.4 of the 
strong stationary duality paper [9] by Diaconis and Fill gives a family of ways to 
create sample-path linking. Here is one [9l eq. (2.36)], with A := PA = AP: 

• Set Xq <— xq with probability 7To(£o)A(xo, Xo)/tto(xq). 

• Inductively, for t > 1, set X t <— it with probability 

P(xt-i,x t )A(xt,xt)/ A(x t -i,x t ). 

Suppose (no,P) and (tt q ,P) are intertwined and that, given X ~ (wq,P), we 
have created linked sample paths for X ~ (yt } p) ; as at (| 1 . 1 1) . Suppose further 
that there are states, call them and 0, such that (respectively, 0) is the unique 
absorbing state for P (rcsp., P) and that 

(1.2) A5l = tf, 

i.e., that A(0,0) = 1 and A(x, 0) = for x ^ 0. Then, for the bivariate process 
(X,X), we see that absorption times agree: Tq(X) = Tq(X). For a parallel expla- 
nation of how sample-path linking can be used to connect the mixing time for an 
ergodic primary chain with a hitting time for a dual chain, consult [9] ; very closely 
related is the FMMR perfect sampling algorithm p~4j[T7]. 

1.2. Strategy for absorption-time decompositions. The two hitting-time the- 
orems discussed in Sections OH] both concern ergodic Markov chains. However, 
since for these theorems we have no interest in the chain after the specified target 
state has been hit, the hitting-time distribution for such a chain is the same as 
the absorption-time distribution for the corresponding chain for which the target 
state is converted to absorbing by replacing the row of P corresponding to state 
by the row vector Sq. 

It should also be noted that hitting-time theorems and stochastic constructions 
are easily extended to hitting times of general subsets A, by the standard trick of 
collapsing A to a single state. 

Here is then a general strategy for obtaining a decomposition of the time to 
absorption in state of a Markov chain X ~ (7r , P) from a decomposition of its 
distribution: 

1 . Discover another chain X ~ (7r , P) for which the sample-point-wise de- 
composition of the time to absorption in state is clearly of the form 
specified for X. (For example, for a pure-death chain started at d with 
absorbing state = 0, the time to absorption is clearly the sum of inde- 
pendent Geometric random variables.) 

2. Find a link A that intertwines (ttq, P) and (tto, P). 

3. Prove the condition (|1.2|) . 

4. Conclude from the preceding discussion that (after sample-path linking) 
Tq(X) = Tq(X) and use the samplc-point-wise decomposition for 2g(X) 
as the decomposition for Tq(X), 
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An early use of our strategy (adapted for mixing times, rather than absorption 
times) was in connection with the theory of strong stationary duality [9] , for which 
the fullest development has resulted in the case of set-valued strong stationary 
duality (see especially [9j Sees. 3-4] and [14]; very closely related is the technique of 
evolving sets [25]). For a very recent application to hitting times and fastest strong 
stationary times for birth and death chains, see [16] and [15] . 

1.3. Quasi-intertwinings. Suppose that the (algebraic) intertwining conditions 
AP = PA and ttq — 7rcA hold for some not necessarily stochastic matrix A with 
rows summing to unity. We call this a quasi-intertwining of (ttq, P) and (-fro, P) by 
the quasi-link A. Then we again have the identities AP* — P*A and n t = ntA. As 
before, suppose further that (|1.2j) holds. Then, although (if A is not stochastic) we 
cannot do sample-path linking and so cannot achieve Tq(X) = Tq(A), we can still 
conclude that To (A) and T^(X) have the same distribution, because 

P(T (A) < t) = 7r t (0) = £- n{x)A(x, 0) = 7r t (6) = P(T fi (A) < t). 

Remark 1.2. The following easily- verified observations will be used in our appli- 
cation in Section [3] 

(a) If Ai is a quasi-link providing a quasi- intertwining of (ttq, P) and (7Tq, P*) and 
A2 is similarly a quasi-link from (7Tq, P*) to (jcq, P), then A := A2A1 is a quasi-link 

from (7T ,P) tO (7T ,P). 

(b) If, additionally, the chains have respective unique absorbing states 0,0*, 
and (fT2l holds for A 1 and for A 2 (i.e., Arffi = 6%, and A 2 5%» = ST), then (TO]) 
holds also for A (i.e., ASq = 

(c) If Ai and A2 in (a) arc both links, then so is A. 

2. AN ILLUSTRATIVE EXAMPLE: BLOCK CHAINS AND THE MORAN MODEL 

2.1. Block chains. In this section we warm up to the main applications of Sections 
[SHU by providing a simple application of the technique outlined in Section [T] Let P 
be a Markov kernel on finite state space X with the following block structure: 
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For i = 0, . . . , k, let pu be a Perron left eigenvector of Pa [that is, a nonzero row 
vector with nonnegative entries such that 

ViPii = p{Pu)Hi, 

where p(A) denotes the spectral radius of a matrix A], normalized to sum to 1. 
It is well known (e.g., [TU Theorem 8.3.1]) that such an eigenvector exists; when, 
additionally, Pa is irreducible, the vector pi is unique (e.g., [T5J Theorem 8.4.4]) 
and is often called the quasi-stationary distribution for Pa . Wc make the following 
special assumption concerning P: For every i and j, the vector piPij is propor- 
tional to pj, say PiPij = P(i,j)pj. In words, the chain with transition matrix P, 
started in distribution pi over block i, moves in one step to block j with probability 
P(i,j); and, conditionally given that it moves to block j, it "lands" in block j with 
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distribution fij. We note in passing that P is a (k + l)-by-(/c + 1) matrix, and that 
P(i,i) = p(Pu) for every i. Define a (k + l)-by-|A" stochastic matrix A by setting 
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(2.2) 
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Now consider a chain X with transition matrix P and initial distribution ttq; 
suppose that ttq is a mixture, say J^JLq tto (*)/■*»> of the distributions fii (each of 
which can be regarded naturally as a distribution on the entire state space). 

Proposition 2.1. In the block-chain setting described above, (ttq,P) and (ttq, P) 
are intertwined by the link A. 



Proof. The proof is a simple matter of checking Definition 11.11 by checking that 
the identity fiiPij = P(i,j)(ij gives AP = PA and that the assumption ttq = 



J2i=o ^o{i)m gives tt = tt A. 



□ 



The sample-path linking developed in Section 11.11 is very simple to describe in 
our present block-chain setting: X t is simply the block (6 {0, . . . , k}) to which X t 
belongs. This simple description is due to the very simple nature of the link (|2.2p ; 
the sample-path linking is more complicated for the applications in Sections [SHU 

2.2. The Moran model. We now apply the block-chain development in the pre- 
ceding subsection to a Markov chain on partitions of the positive integer n intro- 
duced in as a somewhat light-hearted model for collaboration among math- 
ematicians. Their model is precisely the Moran model from population genetics 
according to the following definition [T^l Definition 2.26] modified (a) to switch in 
natural fashion from continuous time to discrete time and (b) to limit the descrip- 
tion of the state at each unit of time by distinguishing between genes with different 
labels but otherwise ignoring the values of the labels: 

A population of N genes evolves according to the Moran model 
if at exponential rate (^) a pair of genes is sampled uniformly at 
random from the population, one dies and the other splits in two. 

The chain we will consider here is a simple example of a coalescent chain, a class 
popularized in the seminal works of Kingman (see for example [5D] , [2T] , [22] ) • For 
a more complete modern picture of the application and study of coalescing chains, 
see Q3]. 

Let S be a set of n indistinguishable objects. (The objects are gene labels in 
the Moran model and are mathematicians in [23j.) The Markov chain of interest 
in is more easily described if we make use of the natural bijection between 
partitions of the integer n and set partitions of S obtained by identifying a partition 
(ni,H2, • ■ • ,n r ) (with 1 < r < oo and n\ > ri2 > ■ • • > n r > 1) of the integer n 
with a partition of S into r indistinguishable subsets where the subsets are of 
sizes n\,ri2, ■ ■ ■ ,n r . Accordingly, if the present state of the Markov chain is the 
partition (rii, . . . , n r ), then, viewing this as a partition of S, uniformly select 
an ordered pair of unequal objects from S, and suppose that the first and second 
objects are currently in subsets of size nj and rtj, respectively. The transition is 
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realized by moving the second object from the second subset to the first, resulting 
in two new subsets of sizes m + 1 and rij — 1. For example, if n = 6 and the 
Markov chain is currently in the partition (4, 1, 1), then with probability 8/30 the 
chain transitions to (5, 1); with probability 2/30, to (4, 2); with probability 8/30, to 
(3, 2, 1); and with probability 12/30 the chain stays in (4, 1, 1). The authors of [23] 
are concerned with the distribution of the hitting time of state (n), the (absorbing) 
single-part partition, when the chain is begun in the n-parts partition (1, . . . , 1). 

Collecting partitions into blocks, where block i contains all partitions with i 
parts (1 < i < n), it is clear that the transition matrix P for this chain is block 
upper bidiagonal, since a one-step transition can only change the number of parts 
by or —1. For example, in the simple case n = 4, one possible ordering of the 
partitions by decreasing number of parts is (1, 1, 1, 1), (2, 1, 1), (2, 2), (3, 1), (4) and 
the corresponding P is given by 
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We will make use of results in [23] to see that P satisfies the assumptions of 
Section |2~T1 To describe the results, let 1 < t < n and consider a partition r of n 
with t parts. For i = 1, . . . , n, let rj be the number of parts of r equal to i, so 
that '^2 i iri = n. Let m r := ( r r ). Define /i t to be the row vector, supported 

on partitions of size t, whose entry corresponding to partition r is m r- 
For 1 < t < n, define A t := 1 — |fez^ ■ For example, if n = 4 and t = 2 
and partitions with 2 parts are listed (as above) in the order (2,2), (3,1), then 
A*2 = (1/3, 2/3) and A2 = 5/6. Let the dual state space be ordered n, n — 1, . . . , 1 
(corresponding naturally to the ordering we have used for the primary state space) . 
Define A by (|2.2[) , but with the nonzero blocks correspondingly in decreasing order 
/i n , Hn-i, ■ ■ ■ , Mi °f subscript. Let 

/ A n 1 - A„ • • ■ \ 
A n _i l-An-x •■• 



P := 



••• Ai / 

we can use our Proposition 12 . 1 1 to derive easily the 



V 

From Theorems 2 and 4 of P 
following intertwining result. 

Proposition 2.2. Let ttq be unit mass at the partition (1,...,1). Then (tyq,P) 
and (S n ,P) are intertwined by the link A. 

As a direct consequence of Proposition 12.21 we get the following hitting-time 
result. 

Corollary 2.3. For fixed n, the law of the time to absorption in state (n) for the 
partitions- chain started in (1, ... , 1) is that of^2™ =2 Y n ^t where Y n j ~ Geo(l— X n ,t)> 
with A 



n,t 



ntn-i) ' are independent. 



In [23], the authors were able to identify a simple expression for the expected 
hitting time of state (n) when the chain is started in ttq = 5(1,...,!), and challenged 
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the reader to discover a pattern for the associated variance. The authors found 
that T(„) = (n — l) 2 . This is confirmed by our Corollary 12. 3[ as 

k=2 k=2 V ' 

Similarly, letting Hn^ '■= X)j=ii _2 denote the nth second-order harmonic number, 
we find 



n(n — 1) 



k(k - 1) 



Var no T (n) = Var J2 Y n . k = ^ 

k=2 k=2 

= 2[n(n - l)] 2 # r ( t 2) - (n - l) 2 (3n 2 
~ (^ — 3)n 4 as n — > oo. 



n(n — 1) 
~ fc(A; - 1) 

2n + 2) 



Proceeding further, it is not difficult to show that, when the partition chain is 
started in ttq, we have 

rp oo 

— y ^5oo := Yl X i 

for independent random variables 

Xj ~ Exp(i(i - 1)), . ? =2,3,..., 

with convergence of moments of all orders and (pointwise) of moment generating 
functions. We omit the details. 



3. Hitting times and interlacing eigenvalues 

3.1. Brown's theorem. Our next construction will provide insight into a hitting- 
time result of Mark Brown [6] that elegantly connects the hitting time of a state 
for a reversible Markov chain started in stationarity to the celebrated interlacing 
eigenvalues theorem of linear algebra (see, e.g., Theorem 4.3.8 in [18]). We now 
proceed to set up Brown's result. 

Let pQ)t=o,i,2,... be a time-reversible ergodic discrete-time Markov chain with 
transition matrix P on finite state space X = {0, 1, ...,n} with stationary distri- 
bution 7T. If we let D := diag(7r(0), . . . ,7r(n)), then reversibility of P implies that 
S := D X I 2 PD~ X I 2 is a symmetric matrix and thus P has a real spectrum and a basis 
of real eigenvectors. Denote the eigenvalues of P by 1 = 0q > 6± > ■ ■ ■ > 9 n > — 1. 

Recall that, for any matrix A, the principal submatrix of A obtained by deleting 
row and column is denoted Aq. Denote the eigenvalues of Pq by 771 > • • • > r\ n - 

1/2 — 1 /2 

Note that So = D ' PqD is symmetric; by the interlacing eigenvalues theorem 
for bordered symmetric matrices (e.g., jTSJ Theorem 4.3.8]), the eigenvalues of P 
and Pq interlace: 0q > rji > 6% > • • • > r] n > 9 n . Cancel out common pairs of 
eigenvalues from the spectra cr(P) and c(Po) as follows. Consider a(P) and cr(Po) 
as multisets and remove the multiset cr(P) H <j(Po) from each of cr(P) and er(-Po). 
Relabel the reduced set of eigenvalues of P as {A,}[ =0 with Ao > Ai > • • • > X r 
and of Pq as {7i}[ =1 with 71 > • • • > j r . After this cancellation, it is clear that the 
remaining eigenvalues strictly interlace: 1 = Ao > 71 > Ai > • • • > 7 r > A r > — 1. 

In what follows we need to assume that A r > 0. This is a rather harmless 
assumption, since we can if necessary shift attention from P to jt^(P + cl) for 
suitably large c. 
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Brown found it convenient to work in continuous time, but he could just as eas- 
ily have proven the analogous result in our present discrete-time setting. To state 
Brown's original continuous-time result, we make use of a very standard technique 
to produce a continuous-time chain from a discrete-time chain, by using indepen- 
dent and identically distributed (iid) Exp(l) holding times (in place of unit times) 
between transitions. This continuous-time chain is sometimes called the continuiza- 
tion of the Markov chain with one-step transition matrix P, and it has generator 
matrix Q = P — I. 

Brown's original result can be stated as follows. 

Theorem 3.1. Let Q = P — I be the generator of the continuization of a Markov 
chain with one-step transition matrix P. In the continuized chain, the distribution 
(or law) C^Tq of the hitting time of state when the chain is started in stationarity , 
is that of Yi, where Yi, Y2, . . . ,Y r are independent and the distribution of Yi 
is the "modified Exponential" mixture 

k-t^+O-i^)"-*!-*) 

of unit mass at and the Exponential distribution with parameter 1 — 7^' the X's 
and 7 's are defined as above. 

Wc find it more convenient to work in discrete time, where the corresponding 
theorem (involving Geometric, rather than Exponential, distributions) is as follows. 

Theorem 3.2. In the discrete-time setting outlined above, C^Tq is the distribu- 
tion of 21=1 Yi, where Yi, Y2, . . . , Y r are independent with the following "modified 
Geometric" distributions: 

(3.1) ^i^+^-l^Geod--,,). 

We have our choice of working in discrete or continuous time because, fortunately, 
for any finite-state Markov chain and any target state there is a simple relationship 
between hitting-time distributions in the two cases. Let Tq be the time to hit state 
in the discrete-time chain (X t )t=o.i,2.... with transition matrix P, and let T§ be the 
corresponding hitting time in the continuized chain. Then the Laplace transform 
i^Tgis) ■= E exp(— sTq) and the probability generating function G T d(z) := Ez T » 
of the hitting times satisfy a simple relationship: 

Lemma 3.3. For any finite- state discrete-time Markov chain and any target state ; 
we have the following identity relating the distributions of the hitting time of state 
for the continued chain and the discrete-time chain: 

i>T S (s) = G T *(^y s>0. 

Proof. Let ~ Exp(l) be iid and independent of T$. By definition of the con- 
tinuized chain, we have Tg = X)i=i -^i- Then 

- s X>) =e (it7) ° =Gt << (rb) ■ 

□ 



HITTING TIMES AND INTERLACING EIGENVALUES 



9 



This lemma allows us to easily derive Theorem 13 . 1 1 from Theorem 13.21 (and vice 
versa), since for s > we have 



^?w = G ^(rb) = n 

=n 



1-7* , f 1 _ 1 - 7i 



l-7i 
1+s 



i - Ai V i - Ai y i — VS- 

1 - 7i , ( x _ 1 - 7A 1 - 7i 



1 — Aj V 1 - A, / 1 - 7i + s 



Our main result of Section[3]is another proof for Theorem l3.21 culminating in our 
Theorem 13. 161 (sec also the last paragraph of Section |3"H| . Our proof provides — at 
least when the quasi-link A we construct is a bona fide link — an explicit stochas- 
tic construction of the hitting time of state from a stationary start as a sum of 
independent modified Geometric random variables. We tackle our proof of Theo- 
rem G£5] in two stages: in Section |3~21 we build a certain "star chain" (random walk 
on a weighted star graph) from the given chain and prove Theorem 13.21 when this 
star chain is substituted for the given chain, and in Section [3~3l we attempt to "link" 
the given chain with the star chain of Section 13.21 In Section 13.41 we combine the 
results of Sections I3.2H3.3I and provide our complete proof of Theorem 13.21 We 
could equally well prove the continuous-time analogues of all of our theorems and 
then apply the analogous intertwining results outlined in Section 2.3 of |13j to pro- 
vide (again when A is a link) an explicit continuous-time stochastic construction 
for Theorem 13. II We choose to work in discrete time for convenience and because, 
we believe, the ideas behind our constructions are easier to grasp in discrete time. 

3.2. A stochastic construction for the star chain. Carrying out step 1 of the 
four-step strategy outlined in Section 11.21 (finding a chain X for which the hitting 
time of state can be decomposed as a sum of independent modified Geometric 
random variables) turns out not to be too difficult; this step is carried out later, in 
Lemma r3.10l However, step 2 (finding a link A between the given X and X) proved 
challenging to us, so we break it down into two substeps, as described at the end 
of the preceding subsection. In this subsection we build an ergodic star chain X* 
from the given chain X and show that the Markov semigroups for X* (with the 
target state 0* converted to absorbing) and X are intertwined by a link A2. The 
state spaces for X* and X will both be {0, . . . , r}, and the roles of and 0* will 
both be played by state 0. For the star chain, we make full use of the notation 
in Section 13.11 The "star" has "hub" at and "spokes" terminating at vertices 
1, . . . , r. The r-spoke star chain we build has previously been constructed in [3]. 

For the sake of brevity it is convenient to establish some additional notation. 
Define 

p, := tor 1 = l,...,r, 



1 - A, 



and for < k < r define 
(3.2) 




,fc 



Set 7T* := (tt*(0), . . . , 7r*(fc), 0, . . . , 0) e K r+1 and note that tt* = S . The following 
lemma lays out the ergodic star chain of interest corresponding to the given chain. 



10 



JAMES ALLEN FILL AND VINCE LYZINSKI 



Lemma 3.4. 

(a) For all < k < r we have 7r|(i) > for i = 0, . . . , k and X)i=o = 

(b) The row vector 7r* := tt* is the stationary distribution of the ergodic r-spoke 
star chain with transition matrix P* satisfying, for i = 1 , . . . , r, 

P*(i,0) = l-7i and P*{i 1 i)= ll . 

P *(°'0= (1 " 7r ^y (0 «^ **(o,o) = i-^£(i-t0**(0- 

Proof. 

(a) Fix fc e {0, . . . ,r}. Clearly 71^(0 ) > 0, so we begin by showing that 7rjj!(i) > 
for i = 1, . . . , fc. Since 1 — > 0, we'll do this by showing that each factor in 
the product n?yj m is strictly positive. Indeed, if j > i this is clear because 
< Pj < 1. If j < i, then we use 

l-7j-Pj(l-70 _ Pj(l-7i)-(l-7 J ) ^ _ Q 
7» ~ 7j 7j ~ 7i 7j - 7* 

where the inequality holds because Aj > 7$ by the interlacing condition. To show 
2j=i 7r fe(0 = 1' we repeat the argument in the proof of Lemma 2.1 in [B] and 
include it for completeness. Define 

(3.3) ^):=n^^. 
Then "0(0) = L an d we will show 



(3.4) V(s) = ^(0)+S^(«') /^. fOT general s 

1 T?. ~T~ S 



i = l 



7i + s 



= ^2*1^) at s = 0, 
which will complete the argument. To show (|3.4p . first set 

k k 

f(s) := U(l- 7j + PjS ), g( s ) : = JJ(1_ 7 . + S ), /( S ) :=/(*)- I JJ Pj ] 5 ( s ) 
i=i j=i \j=i 

Note that /(s) is a polynomial of degree < k — 1 and that 

/(-l + 7i) = /(-l + 7i), i=l,...,*. 

Define 

h(*):=x;(Tfc(0(i-7i) n (i-7i+«) 

1=1 \ 3&--l<j<k 

A brief calculation yields 

»/. v1 \ /(7t~l) 
^feW( 1 -7i) = -77 ttj 

and we see that 

Kli ~ 1) = /(7< - 1) = /(7< - 1), i = 1,...,*. 
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But h(s), like f(s), is a polynomial of degree < k 
Finally, we see 

< k \ 

Up 



1 

k 



g(s) + m 



1, and so h(s) = f(s) for all s. 
h(s) 



A- 



7i 



7i + s 



establishing (J3T4J) and completing the proof of part (a). 

(b) Clearly, P*1 T = l T . To show that P* is stochastic, we need only show 
that P* > entrywise. This is clear except perhaps for the entry P*(0,0). To see 
P*(0,0) > 0, we first note that P*(0,0) = trP* - trP n *; Lemma 2.6 in [6] then 

gives trP* - trP * = *£i=o X i ~ El=i 7< = E[=o( A i - 7i+i) + A r > 0. Part (a) 
establishes that 7r* = tt* is a distribution, and one sees immediately that tt* satisfies 
the detailed balance equations for the transition matrix P* . □ 

Remark 3.5. It would seem natural to define a /c-spokes star chain with transition 
matrix p*( fe ) and stationary distribution ir*^ for general k just as is done for k = r 
in Lemma HOI However, it is then not clear whether P*^ k \o,Q) > 0. Moreover, in 
our construction we use only the P* of Lemma l3.4f b) (with k = r). 



Define P* bs to be the chain (-X' t *)t=o,i,... modified so that is an absorbing state 
and note that 

0-( P abs) = {l,7li---,7r}- 

We now begin to head towards Theorem I3.11[ which will show that £ w *(Tq) = 
£Q^I=i ^i) f° r the YiS described in Theorem 13.21 To do this, we will construct a 
link A2 between the absorbing star chain and a dual chain (X t )t=o.i,... for which the 
hitting time for state is explicitly given as an independent sum of the modified 
Geometric random variables Yi. 

Remark 3.6. If the given chain is already a star chain, then the star chain of 
Lcmma l3.4l is simply obtained by collapsing all leaves with the same one-step tran- 
sition probability to state into a single leaf. This is established as Proposition lA.il 
in the Appendix, where it is also shown that the stationary probabilities collapse 
accordingly. For example, suppose the given chain is the star chain with transition 
matrix 



P = 



We see that tt = i(6, 4, 4, 3, 2, 2) and that 

o-(P) = {1, 5/6, 0.8023, 0.7303, 2/3, 0.1896}, a(P ) = {5/6, 5/6, 7/9, 2/3, 2/3}. 

The reduced set of eigenvalues of P is {1, 0.8023, 0.7303, 0.1896} and the reduced 
set of eigenvalues of Po is {5/6, 7/9, 2/3}. The star chain constructed in Lemma l3~4l 
has three spokes with probabilities 1/6, 2/9, 1/3 of moving to the hub in one step 



/ 4/9 


1/9 


1/9 


1/9 


1/9 


1/9 \ 


1/6 


5/6 














1/6 





5/6 











2/9 








7/9 








1/3 











2/3 





V V3 














2/3 J 
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and respective stationary probabilities 8/21, 3/21, 4/21 (with stationary probabil- 
ity 6/21 at the hub). 

The key to our construction will be the following "spoke-breaking" theorem. 

Theorem 3.7. For each i = 1, . . . , r, the distribution ir* € R r+1 can be represented 
as the mixture 

(3.5) 7T* = p i n*_ 1 + (1 - Pi)vi 

oJtt*_ 1 and a probability distribution Vi (regarded as a row vector in R r+1 ) satisfying 

(3.6) Vi P* = -yM + (1 - -JiK-i- 



Proof. Fix i. Clearly there is a unique row vector v = z/, satisfying (|3.5[) . and it 
sums to unity because 7r* and each do. We will solve for v and see immediately 
that v has nonncgative entries; indeed, we will show that v is given by 

(3.7) Ki) = /T^fcl<0') *l<J<i 

I if j — or j > i. 



It will then be necessary only to prove that v satisfies (|3.6|) . 

We begin by establishing (|3.7p for j = 1, . . . , i — 1. (For t = i — 1 and t = i, 
the notation Yik^j wm be shorthand for the product over values k satisfying both 
1 < k < t and k ^ j.) In that case, 

(i-pMj) = <(j)-pi<-i(j) 



2 2 

l-7fe-pfc(l-7j) . . -r-r 1 - 7fe - /9fe(l - 7j) 



(i-ft)!! — — Mi -ft) 11 

7j - 7 



i 



J l-7i -Pf(l-7j) 



l-7i -ft(l -7j) 

as desired, where the first equality follows from (|3.5[) . and the second and third 
employ the formula (|3.2[) both for ir* and for ir*_ 1 - 
For j = i we calculate 

(1 - Pi)v{i) = n*(i) - Pi-K^-iii) = <(«), 

i.e., 

1 — 7i 

1 - 7i - ft(l - 7i) 

again as desired. 

For j = 0, flU]) gives that 

i i — 1 

(i - p>(o) = <(o) - Pint.M = n Pk - Pi n Pk = o, 

fe=i A—i 

once again as desired. For j > i, (|3.7j) is clear because 7r*(j) = = flf.jXj). 

It remains to check that v satisfies (|3.6p . Since both sides are vectors summing 
to 1 (on the left because v is a probability distribution and P* is a transition 
kernel, and on the right because both v and 7r*_i are probability distributions), we 
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need only check LHS(j) = RHS(j) for j ^ (henceforth assumed). We begin by 
calculating the state- j entry of the LHS assuming j < i: 



LHS(j) = ^ i u(k)P*(k,j) = Y l v (k)P*(k,j) 



k=0 



fe=l 



1-7, 



<(j) 



.1 - H - Pi(l - 7j) 
On the other hand, using (|3.5[) we calculate 

RHS= 7 r*_ 1 +7i(^-<_i) 

= <-i + 7i |> - Pi 1 « - (1 - ) 

Therefore, for j < i the jth entry of the RHS is 



x (7jO- 



RHS(j) = p^*(j) 



P, VC?) 



(l- Pi )(l- 7i ) 
1 -7i-Pi(!-7j) 
(1 - 7i)Pi7j 



7iPi(l -7j) 



1 -7i- - lj) 



.1 -7» - M 1 - 7?) 
= LHS(j). 

If j > i, then LHS(j) = = RHS(j'), finishing the proof that v satisfies 



□ 



The preceding Theorem 13.71 suggests the form for the chain (X t )t=o,i,2,... on 
{0,1,..., r}, where the times spent in state j = 0, 1,2, ...,r in this chain are 
independent and distributed as the Yj's in Theorem 13.21 Before proceeding to the 
construction in Lemma |3.10[ the next lemma provides some preliminaries. 

Lemma 3.8. Let < k < r. Let 7tk(j) ■= PkPk-i ■ ■ ■ Pj+i( L ~ Pj) f or a ^ 1 < i < k, 

and let Ttk(k) := 1 — pk- Then TTk(j) > for 1 < j < k, and Y^j=i^k(j) = 

1 — J\i=i Pi- V we define 7^(0) := Ili=i Pi' then tt^ gives a probability distribution 
on 0,1, ... ,k. 

The proof of this lemma is very easy. Let us also adopt the convention ttq := Sq. 

Remark 3.9. Paralleling (|3.5j) in Theorem 13.71 we have 

TTfc = PkKk-i + (1 - Pk)fa for 1 < fc < r. 

We are now ready to construct (Xt): 

Lemma 3.10. Let (Xt) be the absorbing Markov chain with state space {0, . . . , r} 
begun in distribution fr := fr r , with transition matrix P defined by 

1 if = j = i 

li if < j = i 

(1 - 7i) • 7r<_i(i) i£j<i 

if j > i. 



P(i,j) 



Then 



(a) If Zi is the time spent in state i (including time 0) by (X t ) with initial dis- 
tribution 7r prior to hitting 0, then C(Z\, Z2, ■ ■ ■ , Z r ) = C{Y\, Yi, . ■ ■ , Y r ). 
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(b) If Tq is the hitting time of state for the chain (Xt) with initial distribution 
tt, then T = Y^=i Y i- 

Proof, (a) When viewed in the right light, the lemma is evident. The chain moves 
downward through the state space {0, 1, . . . , r}, with ultimate absorption in state 0, 
and can be constructed by performing a sequence of r independent Bernoulli trials 
W r , ■ ■ ■ , Wi with varying success probabilities 1 — p r , . , . , 1 — p\, respectively. If 
Wi = 0, then the chain does not visit state i, whereas if W, = 1 then the amount 
of time spent in state i is Geom(l — ji) independent of the amounts of time spent 
in the other states. 

A formal proof of part (a) is not difficult but would obscure this simple construc- 
tion and is therefore not included. 

(b) This is immediate from part (a), since Tq = $^i=i Zi. □ 

As the culmination of this subsection we exhibit an intertwining between (tt* , P* hs ) 
and (n,P). 

Theorem 3.11. Let A2 be defined as follows: 

A 2 (0, :) := 60, A 2 (i, :) := z/j for i = 1, . . . , r. 

Then (tt* , P* hs ) and (tt,P) are intertwined by the link A 2 , which satisfies (|1.2[) ; to 
wit, 

(3-8) A 2 P a * bs 

(3.9) TT* 

(3.10) K 2 5l 

Proof. We begin by noting that A 2 is stochastic because, as noted in Theorem 13. 71 
each Vi is a probability distribution. 

From Theorem 13.71 we have that tt*. = Pk^l-i + (1 ~ Pk)vk for 1 < k < r, 
and from Remark 13.91 we have the corresponding equations for tt^ , namely, tt^ = 
PkTTk-i + (1 — Pk)5k for all 1 < k < r. One can use these results to prove tt*, = TTkA-2 
for k = 0, 1, . . . , r by induction on k; in particular, Q3.9P follows by setting k = r. 

To show O, first observe (A 2 P a * bs )(0, :) = S = (PA 2 )(0,:). Comparing zth 
rows for 1 < i < r, we see 

(3.11) (A 2 P a * bs )(z, :) = *AP; bs = W + (1 - 7 i)<i 

by (|3.6p and the fact that k'j(O) = for all i. Iterating Theorem 13.71 we see for 
i = 1, . . . , r that 

< = (1 - PiM + 

= (1 - Pi)^ + p l [(1 - /9i_i)l/,_i + Pj_i7T*_ 2 ] 
= (1 - Pi)^ + /9,(1 - /9i_i)l/j_i + PiPi-l7T*_ 2 
= • • • = 7Ti(i)Wj + TT t (i - H h 7Ti(l)fl + 7Ti(0)(5 O . 

So 7r* = 2j=i + TTi (0)(5o f° r * = 1, . . - , 7", and the same equation holds for 

i = because ttq = Sq = ttq. Applying this to equation (|3.11j) we find for i = 1, . . . ,r 



PA 2 , 

7rA 2 , 
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that 

i-1 

(A 2 -Pabs)0'> = Wi + _ 7<)*i-i(j>j + (1 - 7i)^-i(0)5 

J'=l 

= (PA a )(i,:), 

as desired, where at the last equality we have recalled A 2 (0, :) = 5q. 

Finally, (|3 . 10[) asserts that the Oth column of A2 is 5$ . This follows from the 
definition of A2, since it has already been noted at (|3.7|) that z/,(0) = for i = 
l,...,r. □ 

3.3. Quasi-link to the star chain. The main result of this subsection is The- 
orem 13.131 which provides a quasi-link between the absorbing transition matrices 
P a bs and P* bs corresponding to the given chain and the star chain, respectively. We 
begin with a linear-algebraic lemma. 

Lemma 3.12. The matrix P a bs has n + 1 linearly independent left eigenvectors. 
Its multiset of n + 1 eigenvalues is {1, rji, . . . , r) n }. 

Proof. Recall that o~(Po) = {rji, . . . , r) n }. Recall also that Do = diag(7Ti, . . . , 7r n ) 

2/2 1/2. ~ 

and that So = D PqDq is a symmetric matrix. Let U be an n-by-n orthogo- 
nal matrix whose rows are orthonormal left eigenvectors of So, so that USqU t = 

diag(?7i, 772, ■ • ■ , r/ n ). Then the rows (denoted ui,...,u n ) of the n-by-n matrix 
— 1/2 

U := UD Q ' are left eigenvectors of Pq with respective eigenvalues 771, . . . , r] n . For 
i = 1, . . . , n, define the scalar 

(0k)PQ,0) 

Wi := — ; 

m - 1 

then (Wi\ui)P ahs = n^w^Ui) and n t E cr(P abs ). Finally, <5 -P a bs = $o- The n + 1 
eigenvectors So and (wi\v,i) for i = 1, ...,n are clearly linearly independent, and 
our proof is complete. □ 

Note that 

{w l \u i )T r = (u^lu^Pabsl 7, = ^(wilu^l 71 

and rji < 1, implying that (wi\ui)l T = and Wi = —Uil T . 

Let n.i denote the algebraic (also geometric) multiplicity of the eigenvalue 7; 
as an eigenvalue of Pq (here we are working with the reduced set of eigenvalues 
again). Relabel the eigenvectors corresponding to 7, by u\, . . . ,u l n .. Note that, 
when viewed as an eigenvalue of P a bs, Ji has algebraic (also geometric) multiplicity 
Hi, with corresponding eigenvectors (— u\l T \u\), . . . , (— u\ li l T \u l ni ). In the next 
theorem we construct our (r + l)-by-(rt + 1) quasi-link Ai between (7r,P a b s ) and 

Theorem 3.13. There exists a quasi-link K\ providing a quasi-intertwining between 
("",Pabs) and (7r*,P a bs) cmd satisfying (jl.2p . i.e., a matrix Ai with rows summing 
to 1 such that 

(3.12) 7r = 7r*Ai, 

(3.13) A 1 P abs = P a * bs A 1 , 

(3.14) Axtf = Si 
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Proof. If row i of Ai is denoted by xi for i = 0, r, then for (|3.13j) we require 

zoPibs = x ; x l P ahs = (1 - "fi)x a + j i x l i = l,...,r. 
This forces xq = Sq and 

XiiPaha - lil) = (1 - Ji)5 , i = 1, • • . ,r. 
Therefore, for AiP a b s = P* bs Ai to hold, we necessarily set 

rii 

Xi = S + ^2 c l j (-■ujfluj), i = l...,r 
i=i 

where the cj's are soon-to-be-determined real constants. 

For any choices of ci-'s above we have that the rows of Ai sum to unity and 
AiP a bs = Pabs^i' ^ remains to be shown that we can define c*-'s so that (|3 . 12[) 
holds. The difficulty is that there may exist values r/i S o~(Pq) such that r\i ^ jj 
for any j = 1, . . . , r. However, we will show in the next lemma that tt is in the 
span of the eigenvectors corresponding to the remaining eigenvalues, and that will 
complete our proof of (|3.12[) . 

To prove (15441) . we use (13421) - (1343)) to get 7fP* bs = 7r*A 1 P a ' bs = tv*P*£ s Ax\ wc 
find [using Ai(0,0) = 1] that the 0th entry of this vector is 

P.(To < t) =^7r*(i)^P*4(i,i)Ai(i,0) 

i 3 

= tt*(o) +j2^(i)[p:Lm+p:Uhi)MiM 
= tt*(o) +xy «[i + PX(i,i)(Ai(i,o) - 1)] 

= 7 r*(0)+^^( J )[l + 7? t (A 1 (z,0)-l)] 

r 

= l + $>*(i) 7 f(Ai(i,0)-l). 
i=i 

We also have from (|3. 15|) in the proof of the next lemma that P^To t) = 
1 - Y^=i t*Wt*- Therefore A x (i, 0) = for i > 0, and (f3TT4]) follows. □ 

Lemma 3.14. There exist real constants c* such that -k = 7T*Ai. 

Proof. We will make use of the fact that 

r 

(3.15) P 7r (T >i) = ^7r*(j) 7 j, i = 0,l,..., 

j'=i 

which follows from its continuous-time analogue, equation (1.1) in [BJ, using Lemma [3.3l 
[That analogue is established using the fact that the function tjj in our equations 
(|3.3[) - (|3.4[) is the Laplace transform of To for the stationary continuized chain; 
see [6j for further details.] Define 

7T_ := (7r(l),...,7r(n)) G M"; 

we would use the notation ttq to indicate this deletion of the 0th entry from 7r except 
that it conflicts with our notation for the initial distribution of the given chain. We 
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then have that P-k{Tq > t) = it-oPq1 t . Using the spectral representation of Pq we 
find for t > that 

n n n n 

(3.16) P W (T > t) = y/*(i)*U)U(k,i)U(k,j)4 = J>»£. 

i=i j=i fe=i k=i 

Here q = (qi, . . . ,q n ) = (it 1 JqU t ) 2 , where both the nonnegative square root and 
the square are in the Hadamard sense. In particular, q^ > for all k = 1, . . . , n. 
Comparing (|3.15[) and (|3 . 16[) . it is clear that if iji ^ jj for every j = 1, ...,r, 
then qi = 0. Again comparing (|3.15|) and ()3.16j) . for each "fj there is an r\k = Jj 
such that the coefficient of rf k in ()3.16|) . namely qk, is strictly positive. Now q = 
{^ 1 -qU t ) 2 equals the Hadamard square (7r_ D- 1/2 U T ) 2 . We can therefore choose 
R, a diagonal matrix with ±1 along the diagonal, such that tt_o = q 1 ^ 2 R(fi Dq^ 2 ) = 
a^l 2 RU ; here q 1 / 2 is the Hadamard nonnegative square root of q. Relabel the 
entries of the vector q (and of R) so that 

r ni 

-o = £E^) 1/2 ^ 

i=l ] = 1 

Letting c] = rj((7]) 1 / 2 /7r*(i) yields 

r rii 
i=l 3 = 1 

It remains only to show that for this choice of c*'s we have 

r rii 

7r (°) = 1 +EE^( i ) c K-^i T )- 

t=i 3=1 

This is immediate from 

r rii 

1 - tt(O) = tt_oF = jPRUV = EE^(^) 1/2 H l 7 )' □ 

i=\ 3 = 1 

Our construction of Ai uses the eigenvectors of P a b s ; the entries of these eigen- 
vectors are not all nonnegative, and as a result neither (in general) are the entries 
of Ai. In the special case that the given chain is a star chain, the quasi-link Ai is a 
bona fide link. For example, for the chain considered in Remark 13.61 the quasi-link 
Ai is easily seen to be the link 

\ 
1/2 
10 0' 
1/2 1/2 / 

Remark 3.15. If r = n (i.e., the reduced spectra are the same as the unreduced 
spectra), then it is not hard to show that the quasi-link Ai of Theorem 13.131 is 
uniquely determined. 



Ai = 



1 








1/2 








V 
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3.4. The big link A. Combining the quasi-link Ai of Theorem 13.131 between 
(7r,-Pabs) and (7r*,-Pabs) and the link A 2 of Theorem 13.111 between (7r*,P* bs ) and 
(tt, P), we obtain the desired quasi-link A = A 2 A! between (%, P a b s ) an d (tTj P)- 

Theorem 3.16. Let A := A 2 Ai. Then A is a quasi-link providing a quasi-inter- 
twining of (ir, P a bs) and (ir, P), and therefore CttTq = C-aTo- 

Proof. This follows from Remark 11.21 and the discussion in Section 11.31 □ 

If A is stochastic, then we have a link between P a b s and P and we can use 
the discussion following Definition 11.11 to construct a sample path of (X t ) given 
a realization of (X t ). However, it's easy to find examples showing that A is not 
nonnegativc in general. 

The discussion preceding Remark 13.151 shows that A is a link if the given chain X 
is a star chain. More generally, A is a link if the given chain is a "block star 
chain", defined as follows: Choose positive numbers bo,...,bk summing to unity 
and < ttq < 1. For i = 1, . . . , fc, let c; := Ttobi and let Qi be an ergodic and 
reversible Markov kernel with stationary probability mass function iti. Let P be 
the following special case of (|2.1[) : 

/ bo &i7Ti b 2 TT 2 . . . b k ir k ^ 

cil T (1-cOQi ... 

c 2 f T (l-c 2 )Q 2 ... 

V c fe l T ... (1 - c k )Q k J 

it is easily checked that P is ergodic and reversible with stationary distribution equal 
to the concatenated row vector (tto + — 1 (tt"o|7T"i | • • * | tt^ ) , and that the reduction 
of spectra described in Section 13.11 results in {71,..., 7,,} being some subset of 
distinct elements from {1 — c\, . . . , 1 — c k }- If, for example, r = k, then Ai is the 
matrix (|2.2p , where no = (1) is 1-by-l and we recall for 1 < j < k that fj,j (— Hj) is 
the quasi-stationary distribution for the jth diagonal block (1 — Cj)Qj of P; hence 
Ai is a link (and so, then, is A = AiA 2 ). We are not aware of other interesting 
cases where A is guaranteed to be a link, but the key is to arrange, as for block 
star chains, for P to have nonnegative eigenvectors corresponding to eigenvalues 

71 ■ ■ ■ ,7r- 

Remark 3.17. Is there a unique quasi-link A which, like the one constructed in 
Theorem l3.16[ satisfies ASq = Sq and provides a quasi-intertwining of (tt, P a b s ) and 
(it, P)? We do not know the answer in general, but if r = n, then the answer is 
affirmative by Remark 13.151 and the invertibility of A 2 . 

4. Another representation for hitting times from stationarity 

Our final application of the strategy outlined in Section 11.21 will provide a sto- 
chastic construction for an alternative characterization of the hitting-time distribu- 
tion from stationarity first proved by Mark Brown [personal communication] in an 
unpublished technical report. A published version of a special case can be found 
in [5]. Our construction here is notable in that it will provide a generalization (to 
not necessarily reversible chains) of the discrete-time analogue of Brown's original 
result, and it is by applying our strategy that we discovered the generalization. 

Brown's original theorem is the following, in which is an arbitrary fixed state. 
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Theorem 4.1 (Mark Brown). Consider an ergodic time-reversible finite- state con- 
tinuous-time Markov chain with stationary distribution tt. Let V be a random 
variable with 

p^.flsss^sa, „<*<». 

1 — 7T(0) 

Let Vi, V2, ■ ■ ■ be iid copies ofV, and let N be independent of the sequence (Vi) with 
N + 1 distributed Geometric with success probability tt(0): 

P(N = k) = tt(0)[1 - 7r(0)f, k = 0, 1, . . . . 

Then the distribution C^Tq of the nonnegative hitting time Tq ofO from a stationary 
start is the distribution ofY^ =l Vi. 

We will focus on the following discrete-time analogue. As in Section[3j analogues 
of all of our results can be established in the continuous-time setting as well, but 
we have chosen discrete time for convenience and ease of understanding. 

Theorem 4.2. Consider an ergodic time-reversible finite-state discrete-time Markov 
chain with stationary distribution tt. Assume that P (0, 0) is nonincreasing in t. 
Let V be a random variable with 

p(y>t) = pt(0 ' 0) - 7r(0) t = oi 

f(v>t, l _ JT{0) > u,±,.... 

Let V±, V2, ■ ■ ■ be iid copies ofV, and let N be independent of the sequence (Vi) with 
N + 1 distributed Geometric with success probability tt(0): 

P(N = k) = tt(0)[1 - 7r(0)] fc , k = 0, 1, . . . . 

Then the distribution C^Tq of the nonnegative hitting time To ofO from a stationary 
start is the distribution ofY^ =l Vi. 

The assumption in Theorem 14.21 that P'(0,0) is nonincreasing in t is met, for 
example, if the chain is time-reversible and all the eigenvalues of the one-step tran- 
sition matrix P are nonnegative. However, we do not need to assume re- 
versibility to follow our approach, so Theorem l4.2l (and likewise Theorem 14. lj) 
is true without that assumption. For a non-reversible scenario in which the nonin- 
creasingness assumption is satisfied, see Remark l4.7l and the paragraph preceding it. 

Following our strategy, we aim to provide a sample-path intertwining of the 
given chain X in Theorem 14.21 with a chain X (with, say, initial distribution ttq 
and transition matrix P) for which the hitting time Tq has (for each sample path) 
a clear decomposition X)i=i ^1 as m the theorem. As in our earlier application, 
we can treat as an absorbing state for the given chain, whose one-step transition 
matrix we then denote by P a bs- We thus wish to find (ttq,P) and a link (or at 
least quasi-link) A such that tt = tt^K and AP a b s = -PA. The chain X we will 
construct has state space {0, 1, . . . }. Although the state space is infinite, this gives 
no difficulties as the needed intertwining results from [9| apply just as readily to 
Markov chains with countably infinite state spaces. First we construct our A. 

Suppose the given chain has state space {0, 1, . . . , n}. We adopt notation that 
highlights the special role of state 0. Let n = (7r(0)|7r_ ) G with tt_ G R", 

and similarly let P i ~ 1 (0, :) = (P i - 1 (0,0) [P i_1 (0, :)_ ) G R n+1 . For i = 1,2,3,..., 
define 



Hi := 



P^Mtt-o-tt^-^O-o 



P^-^O) -tt(0) 



pn+l 



20 



JAMES ALLEN FILL AND VINCE LYZINSKI 



Lemma 4.3. With fa defined above, we have for i > that 

faP = qiTT + (1 - qi)fa+i, 



where 

Proof. First note 

faP 



h - P*-i(0,0)-7r(0) [,) ' 



P^iO, 0)(0|7T_ )P - ir(0)(0\P l ~\0 7 :)_ )P 



P^^O.O) - tt(0) 
Now (0|tt_ o )P = tt - tt(0)P(0, :), and similarly 

(OIP*- 1 ^, :)_o)P = P'-^O, :)P - P i - 1 (0,0)P(0 ! :) 
= P l (0,:)-P l - 1 (0,0)P(0,:); 

hence 

P^O, 0)(0|^ )P - ttCOJCOIP*- 1 ^, :)_o)P 

= p* -1 ^, 0)^-^(0)^(0,:) 

= P'^O, 0)tt - (P l (0, O)tt(O)IO) - (0K0)P' ; (0, :)_ ) 

= P'-^O, 0)tt - P l (0, 0)tt + P*(0, 0)(0[7T_ ) - (0|7r(0)P 4 (0, :)_ ) 

= [P l -\0, 0) - P*(0, 0)]vr + (0|P 4 (0, 0)tt_o - ^(0)P l (0, :)_o). 

Letting 

_ P'-^O) -P 4 (0,0) 



^-^O.Oj-TrfO) ' 

it follows that faP = qin + (1 — qi)fa+i, as desired. □ 

This lemma suggests the form for P and A. Let X have state space {0, 1,2,.. .}. 
Define the transition kernel P by setting P(0, 0) := 1 and, for i > 0, 

P(*,0):=7r(0)ft, P(i, 1) := [1 - 7r(0)]cft, P(i, i + 1) := 1 - ?i ; 

we set P(i,j) ■= for all other pairs (i, j). As the following lemma shows, the hit- 
ting time To for this chain X has a simple decomposition as a sum of Geometrically 
many iid copies of V. 

Lemma 4.4. Let X have initial distribution ttq := 7r(0)do+[l — 7r(0)]di and one-step 
transition matrix P. Then there exist random variables N and Vi, Vj, . . . with joint 
distribution as in Theorem \4-S\ such that (for every sample path) Tq = 2»=i Vi- 

Proof. Let N > denote the number of visits to state 1; and for i = 1, . . . , N, let 
Vi denote the highest state reached in the time interval [Tj,T,_|_i), where denotes 
the epoch of ith visit to state 1. Then all of the assertions of the lemma are clear; 
it is perhaps worth noting only that for t = 0, 1, . . . we have 

i=l ^ ' 

Define A by setting A(0, :) := 5q and A(i, :) := fa for i > 0. Note that A has 
infinitely many rows, each of which is in We then have the following theorem 

whose proof is almost immediate from the definitions and Lemma [ 



HITTING TIMES AND INTERLACING EIGENVALUES 



21 



Theorem 4.5. The quasi-link A provides a quasi-intertwining of (tt, P a bs) and 
(tto,P) and satisfies (|1.2j) . and therefore C^Tq = £# To. 

Proof. It is easily checked that each row of A sums to unity. Further, for i > 



the ith row of AP a b s is fiiP = qiir + (1 — which is the ith row of PA. The 

Oth rows are both So, so we conclude AP a b s = PA. Similarly, tt = ttqK. Finally, 
since ^o(O) = 1 and /^(0) = for i > 0, we have ASq = Sq, which is (|1.2p . The 
equality of hitting-time laws then follows from the discussion in Section 11.31 □ 

Note that A is a link (in which case sample-path linking is possible) if and 
only if for every t > the i-step transition probability P'(z,0) is maximized when 
i = 0; here P is the time-reversed transition matrix P(i,j) := n(j)P(j,i)/n(i). A 
sufficient condition for this is that the state space is partially ordered, is either a 
top element or a bottom element, and P is stochastically monotone. 

Remark 4.6. The intertwining constructed in Lemma 14.31 and Theorem 14.51 can 
be related to the fastest strong stationary time construction of [5] and the corre- 
sponding strong stationary dual constructed in Example 2.6 of [9]. In the interest 
of brevity, we omit an explanation of the connection. 

Remark 4.7. We claim that if P is such that P'(«, 0) is maximized for every t when 
i = 0, then P automatically satisfies the assumption in Theorem 14.21 that P'(0,0) 
is nonincrcasing in t. To see this, consider the chain X with transition matrix P 
started in distribution fj,i = [1 — 7r(0)] _1 [7r — ir(0)So\. Then, for any state i, 



If s(t) is the separation of the chain at time t, then 1 — s(t) equals the minimum 
of this ratio over i, namely, [1 — P*(0,0)]/[1 — tt(0)]. It is well known (e.g., [TJ 
Chapter 9]) that separation is nonincreasing in t, so P*(0,0) is nonincrcasing. 
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Appendix A. P* when P is a star chain 

In Rcmark l3.6l it is claimed that if the given chain P is a star chain, then the star 
chain of Lemma 13.41 is simply obtained by collapsing all leaves with the same one- 
step transition probability to state into a single leaf. More precisely, we establish 
the following: 

Proposition A.l. Let P be the transition matrix of an ergodic star chain with hub 
at 0. If for each 7, in the reduced set of eigenvalues of Pq we define 

m(i) := {j G [n] : rjj = 7;}, 



then P*((M)=E,- 6mW P(0J)- 
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Proof. Define H := diag(r/i, . . . , r) n ) and 

x:= (P(0,l) > ...,P(0,n)) > 
y := (l-r?i,...,l - ??„), 

so that 

P 



P(0,0) 









By the standard formula for the determinant of a partitioned matrix (e.g., |18l 
Section 0.8.5]), if £ is not in the spectrum {rji, . . . , r] n } of H then we find 

(A.l) det(£J - P) = [t - P(0, 0) - x(tl - H)- 1 y T ] det(£J - H) 

for the characteristic polynomial of P. Analogously, define T := diag(7i, . . • ,7r) 
and 

x* := (P*(0,l),...,P*(0,r)), 
2/* := (1 -7i,...,l -7r); 
if £ is not in the spectrum {71, . . . , 7,,} of T, then we find 

(A.2) det(£/-P*) = [t - P*(0, 0) - x*(tl - T)~ 1 y* T ] det(£/ - V) 

for the characteristic polynomial of P*. 
Note that 



P(0,0) =trP-trff = ^6»i 



i=0 i=l 

r r 

(A.3) = ]T A, - = trP* - trr = P*(0,0), 

i=0 i=l 

where the third equality is a result of the eigenvalue reduction procedure discussed 
in Section and the fourth equality is from Lemma 2.6 in [BJ. Similarly, for all 
£ £ {771, . . . , r] n } we have 

det(£Z - P) = detftJ - P*) 
[ ' j dct(£/-P) det(t/-r) ' 

Therefore, for all £ ^ {771, . . . ,77„} we have 



(A.5) E P (Mt— £=£^(0,^ 



t — Wi t — 7.; 

i=l ' i=l ' 



because using definitions of H,x,y,T,x* ,y* and equations (|A.l[) - (|A.4p we find 

tmoi^-^-io-v-.-mo)-^^ 

det(£7-P*) _ ,T_^ m , n ,,1-7, 



i=i 



i=i 



Rewrite (|A.5I) as 



i=l '* i=l Vj€m(i) / ,l 
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Since 71,..., j r are distinct, it follow easily that P*(0,i) = Ylje m (i) ^*(0> •?) f° r 
i = 1, . . . , r, as desired. □ 

Let 7r be the stationary distribution for P. Using the formula for P*(0,i) pro- 
vided by Proposition IA. ll it is a simple matter to check that the probability mass 
function tt* defined by 7r*(0) := 7r(0) and ir* (i) = J2jem(i) n U) for i ^ satisfies 
the detailed balance condition and is therefore the stationary distribution for P*: 
indeed, using the reversibility of P with respect to 7r we have 

7r*(0)P*(0,i)=7r(0) P (M = E <l) p (j,0) 
jem(i) jem(i) 

= £ 7r(j)(l- 7i )=7r*(i)P*(i,0). 
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